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RELAIIONAL DATABASE MANAGEMENT 

SYSTEM HAVING INTEGRATED NON- 
RELATIONAL MULTI-DIMENSIONAL DATA 
STORE OF AGGREGATED DATA 
ELEMENTS 

RELATED CASES 

This is a Cootiouation-in-part of: copendiog application 
Ser. No. 09/514,611 enliUed "Stand-Alone Cartridge-Slyle 
Data Aggregation Server and Method of and System for 
Managing Multi-Dimensionai Databases using the Same" 
filed Feb, 28, 2000, and copending application Sen No. 
09/368,241 entitled "Method Of And System For Managing 
Multi-Dimensional Databases Using Modular- Arithmetic 
Based Address Data Mapping Processes" filed Aug. 4, 1999; 
said Applications being commonly owned by HyperRoU 
Israel, Limited, herein incorporated by reference in their 
entirety. 

BACKGROUND OF THE INVENTION 

1. Field of Invention 

The present invention relates generally to multi- 
dimensional relational databases and, more specifically to 
mechanisms for aggregating data elements in a multi- 
dimensional relational database system and for processing 
queries on such aggregated data elements, and also to 
informational database systems that utilize multi- 
dimensional relational databases and such aggregation/ 
query mechanisms. 

2. Brief Description of the State of the Art 
Information technology (IT) enables an enterprise to 

manage and optimize its internal business practices through 
the analysis and sharing of data internally within the enter- 
prise. In addition, IT enables an enterprise to manage and 
optimize its external business practices through the sharing 
of data with external parties such as suppliers, customers and 
investors, and through on-line transactions between the 
enterprise and external parties. Informational database sys- 
tems (systems that store data, support query processing on 
the stored data, and possibly support analysis of the stored 
data) play a central role in many dififerent parts of today's IT 
systems. 

FIG. 1 illustrates exemplary domains where informational 
database systems are used. As shown, an operational envi- 
ronment 10 generates data which is stored in a data store 22 
in the informational database system 20. These domains 
include data analysis systems (sprcad-sbcct modeling 
programs, snap-shots, extraction, dcnormalization), data 
warehousing, data marts, CLAP systems, data mining 
systems, electronic commerce-enabled web servers, and 
business- to-business exchanges. Modern informational 
database systems typically use a relational database man- 
agement system (RDBMS) as a repository for storing the 
data and querying the data. 

FIG. 2 illustrates a data warehouse-OLAP domain that 
utilizes the prior art approaches described above. The data 
warehouse is an enterprise- wide data store. It is becoming an 
integral part of many information delivery systems because 
it provides a single, central location where a reconciled 
version of data exuacted from a wide variety of operational 
systems is stored. Details on methods of data integration and 
constructing data warehouses can be found in the white 
paper entitled "Data Integration: The Warehouse Founda- 
tion" by Louis Rollleigh and Joe Thomas, published at 
http://www.acxiom.com/whitepapcrs/wp-ll.asp. Building a 



35,604 Bl 

2 

Data Warehouse has its own special challenges (e.g. using 
common data model, common business dictionary, etc.) and 
is a complex endeavor. However, just having a Data Ware- 
house does not provide organizations with the often- 

5 heralded business benefits of data warehousing. To complete 
the supply chain from transactional systems to decision 
maker, organizations need to deliver systems that allow 
knowledge workers to make strategic and tactical decisions 
based on the information stored in these warehouses. These 
decision support systems are referred to as On-line Ana- 
lytical Processing (OLAP) systems. Such OLAP systems are 
commonly classified as Relation OLAP systems or Multi- 
Dimensional OLAP systems. 

The Relational OLAP (ROLAP) system accesses data 
stored in a Data Warehouse to provide OLAP analyses. The 
premise of ROLAP is that OLAP capabilities are best 
provided directly against the relational database, i.e. the 
Data Warehouse. The ROLAP architecture was invented to 
enable direct access of data from Data Warehouses, and 
therefore support optimization techniques to meet batch 
window requirements and provide fast response times. 
Typically, these optimization techniques include 
application-level table partitioning, pre-aggregate 
inferencing, denormalization support, and the joining of 

2^ multiple fact tables. 

A typical ROLAP system has a three-tier or layer client/ 
server architecture. The "database layer*' utilizes relational 
databases for data storage, access, and retrieval processes. 
The "application logic layer" is the ROLAP engine which 

3Q executes the multidimensional reports from multiple users. 
The ROLAP engine integrates with a variety of "presenta- 
tion layers," through which users perform OLAP analyses. 
After the data model for the data warehouse is defined, data 
from on-line transaction-processing (OLTP) systems is 

35 loaded into the relational database management system 
(RDBMS). If required by the data model, database routines 
are run to pre-aggregate the data within the RDBMS. Indices 
are then created to optimize query access times. End users 
submit muhidimensional analyses to the ROLAP engine, 

4Q which then dynamically transforms the requests into SQL 
execution plans. The SQL execution plans are submitted to 
the relational database for processing, the relational query 
results are cross-tabulated, and a multidimensional result 
data set is returned to the end user. ROLAP is a fully 

45 dynamic architecture capable of utilizing pre-calculated 
results when they are available, or dynamically generating 
results from the raw information when necessary. 

llie Multidimensional OLAP (MOLAP) systems utilize a 
MDD or "cube" to provide OLAP analyses. Tlie main 

50 premise of this architecture is that data must be stored 
multidimensiooally to be accessed and viewed multidimen- 
sionally. Such non-relational MDD data structures typically 
can be queried by users to enable the users to "slice and 
dice" the aggregated data. As shown in FIG. 2, such MOLAP 

55 systems have an Aggregation module which is responsible 
for all data storage, access, and retrieval processes, including 
data aggregation (i.e. pre-aggregation) in the MDDB, and an 
analytical processing and GUI module responsible for inter- 
facing with a user to provide analytical analysis, query input, 

go and reporting of query results to the user. 

A more detailed description of the data warehouse and 
OLAP environment may be found in copending U.S. patent 
application No. 09/514,611 to R, Bakalash, G. Shaked, and 
J. Caspi, commonly assigned to HyperRoll Israel, Limited, 

65 incorporated by reference above in its entirety. 

In a RDBMS, users view data stored in tables. By 
contrast, users of a non-relation database system can view 
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Other data structures, either instead of or in addition to the included in such an implementation. Each row within the 

tables of the RDBMS system. FIG. 3A ilhistrates an exem- central fact table includes a multi-part key associated with a 

plary table in an RDBMS; and FIGS. 3B and 3C illustrate set of facts (in this example, a number representing a 

operators (queries) on the table of FIG. 3A, and the result of quantity). The multi-part key of a given row (values stored 

such queries, respectively. The operators illustrated in FIGS. $ in the S#,P#,TP# fields as shown) points to rows (instances) 

3B and 3C are expressed as Structured Query Language stored in the dimension tables described above. A more 

(SQL) statements as is conventional in the art. detailed description of star schemas and the tables used to 

The choice of using a RDBMS as the data repository in implement star sdiemas may be found in C. J. Date, "An 

information database systems naturally stems from the reali- Introduction to Database Systems," Seventh Edition, 

ties of SQL standardization, the weahh of RDBMS-related Addison-Wesley, 2000, pp. 711-715, herein incorporated by 

tools, and readily available expertise in RDBMS systems. reference in its entirety. 

However, the querying component of RDBMS technology When processing a query, the tables that implement the 
suffers from performance and optimization problems stem- schema are accessed to retrieve the facts that match the 
ming from the very nature of the relational data model. More query. For example, in a star schema implementation as 
specifically, during query processing, the relational data described above, the facts are retrieved from the central fact 
model requires a mechanism that locates the raw data table and/or the dimension tables. Locating the facts that 
elements that match the query. Moreover, to support queries match a given query involves one or more join operations, 
that involve aggregation operations, such aggregation opera- Moreover, to support queries that involve aggregation 
tions must be performed over the raw data elements that operations, such aggregation operations must be performed 
match the query. For large multi-dimensional databases, a 20 over the facts that match the query. For large multi- 
naive implementation of these operations involves compu- dimensional databases, a naive implementation of these 
tational intensive table scans that leads to unacceptable operations involves computational intensive table scans that 
query response times. typically leads to unacceptable query response times. 

In order to better understand how the prior art has Moreover, since the fact tables are pre-summarized and 

approached this problem, it will be helpful to briefly 25 aggregated along business dimensions, these tables tend to 

describe the relational database model. According to the be very large. This point becomes an important consider- 

relational database model, a relational database is repre- alion of the performance issues associated with star sche- 

sented by a logical schema and tables that implement the mas. A more detailed discussion of the performance issues 

schema. The logical schema is represented by a set of (and proposed approaches that address such issues) related 

templates that define one or more dimensions (entities) and 30 lo joining and aggregation of star schema is now set forth, 

attributes associated with a given dimension. The attributes The first performance issue arises from computationally 

associated with a given dimension includes one or more intensive table scans that are performed by a naive imple- 

attributes that distinguish it from every other dimension in mentation of data joining. Indexing schemes may be used to 

the database (a dimension identifier). Relationships amongst bypass these scans when performing joining operations.^ 

dimensions are formed by joining attributes. The data struc- 35 Such schemes include B-tree indexing, inverted list indexing 

ture that represents the set of templates and relations of the and aggregate indexing. A more detailed description of such 

logical schema is typically referred to as a catalog or indexing schemes can be found in "The Art of Indexing'*, 

dictionary. Note that the logical schema represents the Dynamic Information Systems Cbiporation, October 1999, 

relational organization of the database, but does not hold any available at bttp://www.disc.com/artindex.pdf. All of these 

fact data per se. This fact data is stored in tables that 40 indexing schemes replaces table scan operations (involved 

implement the logical schema. in locating the data elements that match a query) with one 

Star schemas are fi^equently used to represent the logical ore more index lookup operation. Inverted list indexing 

structure of a relational database. The basic premise of star associates an index with a group of data elements, and stores 

schemas is that information can be classified into two (at a location identified by the index) a group of pointers to 

groups: facts and dimensions. Facts are the core data ele- 45 the associated data elements. During query processing, in 

ments being analyzed. For example, units of individual item the event that the query matches the index, the pointers 

sold are facts, while dimensions are attributes about the stored in the index are used to retrieve the corresponding 

facts. For example, dimensions are the product types pur- data elements pointed therefrom. Aggregation indexing inte- 

chased and the data purchase. Business questions against grates an aggregation index with an inverted hst index to 

this schema are asked looking up specific facts (UNITS) 50 provide pointers to raw data elements that require 

through a set of dimensions (MARKETS, PRODUCTS, aggregation, thereby providing for dynamic summarization 

PERIOD). The central fact table is typically much larger of the raw data elements that match the user-submitted 

than any of its dimension tables. query. 

An exemplary star schema is illustrated in FIG. 4A for These indexing schemes arc intended to improve join 

suppliers (the "Supplier" dimension) and parts (the "Parts" 55 operations by replacing table scan operations with one or 

dimension) over time periods (the "Time-Period" more index lookup operation in order to locate the data 

dimension). It includes a central fact table "Supplied-Parls" elements that match a query. However, these indexing 

that relates to multiple dimensions — the "Supplier", "Parts" schemes suffer from various performance issues as follows: 

and "Time-Period" dimensions. FIG. 4B illustrates the Since the tables in the star schema design typically 

tables used to implement the star schema of FIG. 4A. More 60 contain the entire hierarchy of attributes (e.g. in a 

specifically, these tables include a central fact table and a PERIOD dimension, this hierarchy could be 

dimension table for each dimension in the logical schema of day>week>month>quarter>yearX a multipart key of 

FIG. 4A. A given dimension table stores rows (instances) of day, week, month, quarter, year has to be created; thus, 

the dimension defined in the logical schema. For the sake of multiple meta-data definitions are required (one of each 

description, FIG. 4B illustrates the dimension table for the 65 key component) to define a single relationship; this 

"Hme-Pcriod" dimension only. Similar dimension tables for adds to the design complexity, and sluggishness in 

the ''Supplier" and ''Part" dimensions (not shown) arc also performance. 
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Additionordeletionof levels in the hierarchy will require erates multiple joins. Unfortunately, parallelism can only 

physical modification of the fact table, which is time reduce, not eliminate, the performance degradation issues 

consuming process that limits flexibility. related to the star schema. 

Carrying all the segments of the compound dimensional One of the most fiindamental principles of the multidi- 

key in the fact table increases the size of the index, thus 5 mensional database is the idea of aggregation. The most 

impacting both performance and scalability. common aggregation is called a roll-up aggregation. This 

Another performance issue arises from dimension tables type is relatively easy to compute: e.g. taking daily sales 

that contain multiple hierarchies. In such cases, the dimen- totals and rolling them up into a monthly sales table. The 

sional table often includes a level of hierarchy indicator for more difficult are analytical calculations, the aggregation of 

every record. Every retrieval from fact table that stores lo Boolean and comparative operators. However these are also 

details and aggregates must use the indicator to obtain the considered as a subset of aggregation, 

correct result, which impacts performance. ITie best alter- In a star schema, the results of aggregation are summary 

native to using the level indicator is the snowflake schema. tables, 'lypically, summary tables are generated by database 

In this schema aggregate tables are created separately from administrators who attempt to anticipate the data aggrega- 

the detail tables. In addition to the main fact tables, snow- is tions that the users will request, and then pie-build such 

flake schema contains separate fact tables for each level of tables. In such systems, when processing a user-generated 

aggregation. Notably, the snowflake schema is even more query that involves aggregation operations, the pre-built 

complicated than a star schema, and often requires multiple aggregated data that matches the query is retrieved from the 

SQL statements to get the results that are required. summary tables (if such data exists). FIGS. 5A and 5B 

Another performance issue arises from the pairwise join 20 illustrate a multi-dimensional relational database using a star 

problem. Traditional RDBMS engines are not design for the schema and summary tables. In this example, the summary 

rich set of complex queries that are issued against a star tables are generated over the '"time" dimension storing 

schema. The need to retrieve related information from aggregated data for "month", "quarter" and "year" time 

several tables in a single query — ^'join processing" — is periods as shown in FIG. 5B. Sununary tables are in essence 

severely limited. Many RDBMSs can join only two tables at 25 additional fact tables, of higher levels. They are attached to 

a time. If a complex join involves more than two tables, the the basic fact table creating a snowflake extension of the star 

RDBMS needs to break the query into a series of pairwise schema. There are hierarchies among summary tables 

joins. Selecting the order of these joins has a dramatic because users at different levels of management require 

performance impact. There are optimizers that spend a lot of different levels of summarization. Choosing the level of 

CPU cycles to find the best order in which to execute those 30 aggregation is accomplished via the "drill-down" feature, 

joins. Unfortunately, because the number of combinations to Summary tables containing pre-aggregated results typi- 

be evaluated grows exponentially with the number of tables cally provide for improved query response time with respect 

being joined, the problem of selecting the best order of to on-the-fly aggregation. However, summary tables suffer 

pairwise joins rarely can be solved in a reasonable amount from some disadvantages: 

of time. 35 summary tables require that database administrators 
Moreover, because the number of combinations is often anticipate the data aggregation operations that users 
too large, optimizers limit the selection on the basis of a will require; this is a difiScuU task in large multi- 
criterion of directly related tables. In a star schema, the fact dimensional databases (for example, in data ware- 
table is the only table directly related to most other tables, houses and data mining systems), where users always 
meaning that the fact table is a natural candidate for the first 40 need to query in new ways looking for new information 
pairwise join. Unformnately, the fact table is the very largest and patterns, 

table in the query, so this strategy leads to selecting a summary tables do not provide a mechanism that aUows 

painvise jom order that generates a very large intermediate efficient drill down to view the raw data that makes up 

result set, severely affectmg query performance. summary table-typically a table scan of one or 

This IS an optimization strategy, typically referred to as as ^^^^ ^^^^ j^^les is required. 

Cartesian Joins, that lessens the performance impact of the „ . • i • t. i 

pairwise join problem by Mng joining of unrelated ^'^I^S ^ ^^^^y^^ ^^'^ pie-aggregation is completed, 

tables. The join to the fact table, which is the largest one, is »f ^ ^'^^^^y overhead because the vast majority 

deferred until the very end, thus reducing the size of ^^"^ generated informaUon remains unvisited. 

intermediate result sets. In a join of two unrelated tables 50 ^^ere is a need to synchronize the summary tables before 
every combination of the two tables' rows is produced, a 

Cartesian product. Such a Cartesian product improves query the degree of viable parallelism is limited because the 

performance. However, this strategy is viable only if the subsequent levels of summary tables must be per- 

Cartesian product of dimension rows selected is much formed in pipeline, due to their hierarchies, 

smaller than the number of rows in the fact table. The 55 for very large databases, this option is not valid because 

multiplicative nature of the Cartesian join makes the opti- of time and storage space. 

mization helpful only for relatively small databases. Note that it is common to utilize both pre-aggregated results 

In addition, systems that exploit hardware and software and on-the-fly aggregation in support aggregation. In these 

parallelism have been developed that lessens the perfor- system, partial pre-aggregation of the facts results in a small 

mance issues set forth above. Parallelism can help reduce the 60 set of summary tables. 0n4he-fly aggregation is used in the 

execution time of a single query (speed-up), or handle case the required aggregated data does not exist in the 

additional Work without degrading execution time (scale- summary tables. 

up)4}*] example. Red Bricku has developed STAR- Note that in the event that the aggregated data does not 

join''^ technology that provides high speed, parallelizable exist in the summary tables, table join operations and 

multi-table joins in a single pass, thus allowing more than 65 aggregation operations are performed over the raw facts in 

two tables can be joined in a single operation. The core order to generate such aggregated data. This is typicaUy 

technology is an innovative approach to indexing that aocel- referred to as on-the-fly aggregation. In such instances, 
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aggregation indexing is used to mitigate the performance of 
multiple data joins associated with dynamic aggregation of 
the raw data. Thus, in large multi-dimensional databases, 
such dynamic aggregation may lead to unacceptable query 
response times. s 

In view of the problems associated with joining and 
aggregation within RDBMS, prior art ROLAP systems have 
suffered from essentially the same shortcomings and draw- 
backs of their underlying RDBMS. 

While prior art MOLAP systems provide for improved 20 
access time to aggregated data within their underlying MDD 
structures, and have performance advantages when carrying 
out joining and aggregations operations, prior art MOLAP 
architectures have suffered from a number of shortcomings 
and drawbacks which Applicants have detailed in their 15 
copending U.S. application Ser. Nos. 09/368,241 and 
09/514,611 incorporated herein by reference. 

In summary, such shortcomings and drawbacks stem from 
the fact that there is unidirectional data flow from the 
RDBMS to the MOLAP system. More specifically, atomic 20 
(raw) data is moved, in a single transfer, to the MOLAP 
system for aggregation, analysis and querying. Importantly, 
the aggregation results are external to the RDBMS. Thus, 
users of the RDBMS cannot directly view these results. Such 
results are accessible only from the MOLAP system. 25 
Because the MDD query processing logic in prior art 
MOLAP systems is separate from that of the RDBMS, users 
must procure rights to access to the MOLAP system and be 
instructed (and be carefiil to conform to such instmctions) to 
access the MDD (or the RDBMS) under certain conditions. 30 
Such requirements can present security issues, highly unde- 
sirable for system administration. Satisfying such require- 
ments is a costly and logistically cumbersome process. As a 
result, the widespread applicability of MOLAP systems has 
been limited. 35 

Thus, there is a great need in the art for an improved 
mechanism for joining and aggregating data elements within 
a relational database management system, and for integrat- 
ing the improved relational database management system 
into informational database systems (including the data 40 
warehouse and OLAP domains), while avoiding the short- 
comings and drawbacks of prior art systems and method- 
ologies. 

SUMMARY AND OBJECTS OF PRESENT 45 
INVENTION 

Accordingly, it is an object of the present invention to 
provide an improved method of and system for joining and 
aggregating data elements integrated within a relational 
database management system (RDBMS) using a noo- 
relational multi-dimensional data structure (MDD) achiev- 
ing a significant increase in system performance (e.g. 
[deceased] decreased access/search time), user flexibility 
and ease of use. 

Another object of the present invention is to provide such 
an RDBMS wherein its integrated data aggregation module 
supports high-performance aggregation (i.e. data roll-up) 
processes to maximize query performance of large data 
volumes. 

ou 

Another object of the present invention is to provide such 
an RDBMS system, wherein its integrated data aggregation 
(i.e. roll-up) module speeds up the aggregation process by 
orders of magnitude, enabling larger database analysis by 
lowering the aggregation times. ^5 

Another object of the present invention is to provide such 
a novel RDBMS system for use in OLAP operations. 
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Another object of the present invention is to provide a 
novel RDBMS system having an integrated aggregation 
module that carries out an novel roUup (i.e. down-up) and 
spread down (i.e. top-down) aggregation algorithms. 

Another object of the present invention is to provide a 
novel RDBMS system having an integrated aggregation 
module that carries out full pre-aggregation aod/or *'on-the- 
fly" aggregation processes. 

Another object of the present invention is to provide a 
novel RDBMS system having an integrated aggregation 
module which is capable of supporting a MDD having a 
multi-hierarchy dimensionality. 

These and other object of the present invention will 
become apparent hereinafter and in the claims to Invention 
set forth herein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

In order to more fully appreciate the objects of the present 
invention, the following Detailed Description of the Illus- 
trative Embodiments should be read in conjunction with the 
accompanying Drawings, wherein: 

FIG. 1 is a schematic representation of a prior art infor- 
mation database system, wherein the present invention may 
be embodied. 

FIG. 2 is a schematic representation of the prior art data 
warehouse and OLAP system, wherein the present invention 
may be embodied. 

FIGS. 3A-3C are schematic representations of exemplary 
tables employed in a prior art Relational Database Manage- 
ment System (RDBMS); FIGS. 3B and 3C illustrate opera- 
tors (queries) on the table of FIG. 3A, and the result of such 
queries, respectively. 

FIG. 4A is a schematic representation of an exemplary 
dimensional schema (star schema) of a relational database. 

FIG. 4B is a schematic representation of tables used to 
implement the schema shown in FIG. 4A. 

FIG. 5 A is a schematic representation of an exemplary 
multidimensional schema (star schema). 

FIG. 5B is a schematic representation of tables used to 
implement the schema of FIG. 5A, including summary 
tables storing results of aggregation operations performed on 
the facts of the central fact table along the time-period 
dimension, in accordance with conventional teachings. 

FIG. 6A is a schematic representation of a generalized 
embodiment of an RDBMS of the present invention com- 
prising a relational database having an integrated multidi- 
mensional (MDD) aggregation module supporting queries 
from a plurality of clients, wherein the aggregation engine 
performs aggregation Unctions (e.g. summation of numbers, 
as well as other mathematical operations, such as 
multiplication, subtraction, division etc.) and non-relational 
multi-dimensional data storage functions. 

FIG. 6B is a schematic block diagram of the MDD 
aggregation module of the illustrative embodiment of the 
present invention shown in FIG. 6A. 

FIGS. 6C(1) and 6C(2), taken together, set forth a flow 
chart representation of the primary operations carried out 
within the RDBMS of the present invention when perform- 
ing data aggregation and related support operations, includ- 
ing the servicing of user-submitted (e.g. natural language) 
queries made on such aggregated database of the present 
invention. 

FIG. 6D is a flow chart representation of the primary 
operations carried out by the (DB) request serving mecha- 
nism within the MDD control module shown in FIG. 6B. 
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FIG. 6E is a schematic representation of the view mecha- of the MDDB of FIG. 6B, and the method of searching for 

oism of the RDBMS that enables users to query on the a queried data point therein using a simple binary search 

aggregated data generated and/or stored in the MDD Aggre- technique due to the data files ascending order; 

gation module according to the present invention. FIG. UA is a schematic representation of three exemplary 

FIG. 6F is a schematic representation of the trigger 5 muUi-hierarchical data structures for storage of data within 

mechanism of the RDBMS that enables users to query on the the MDDB of FIG. 6B, having three levels of hierarchy, 

aggregated data generated and/or stored in the IVTOD Aggre- wherein the first level representative of base data is com- 

gation module according to the present invention. posed of items A,B^, and G, the second level is composed 

FIG. 7A shows a separate -platform implementation of the of items C,E,H and I, and the third level is composed of a 

RDBMS system of the illustrative embodiment shown in single item D, which is common to all three hierarchical 

FIG. 6A, wherein the query handling, fact table(s) and structures. 

dictionary of the RDBMS resides on a separate hardware FIG. IIB is a schematic representation of an optimized 

platform and/or OS system from that used to run the MDD multi-hierarchical data stmcture merged from all three hier- 

Aggregation Module of the present invention. archies of FIG. IIA, in accordance with the principles of the 

FIG. 7B shows a common-platform implementation of the present invention. 

RDBMS system of the illustrative embodiment shown in FIG. 12 is a schematic representation showing the levels 

FIG. A, wherein the query handling, fact table(s) and die- of operations performed by the stand-alone Aggregation 

tionary of the RDBMS shares the same hardware platform Server of FIG. 6B, summarizing the different enabling 

and operating system (OS) that used to run the MDD components for carrying out the method of segmented 

Aggregation Module of the present invention. aggregation in accordance with the principles of the present 

FIG. 8A is a data table setting forth information repre- invention, 

sentalive of performance benchmarks obtained by the piG. 13 is a schematic representation of the RDBMS of 

shared-platform type implementation of the MDD Aggre- the present invention shown as a component of a central data 

gation Module of the aiustrative embodiment serving the warehouse, serving the data storage and aggregation needs 

conventional OLAP server (i.e. Oracle EXPRESS Server, of a ROLAP system (or other OLAP system), 
wherein the common hardware/software platform is realized 

using a Pentium II 450 Mhz, 1 GB RAM, 18 GB Disk, DETAILED DESCRIPTION OF THE 

running the Microsoft NT operating system (OS); PREFERRED EMBODIMENTS OF THE 

no. 9A is a schematic representation of the first stage in PRESENT INVENTION 
the method of segmented aggregation according to the ^° Referring now to FIG. 6 through FIG. 13, the preferred 

principles of the present invention, showing initial aggrega- embodiments of the method and system of the present 

lion along the 1st dimension. invention will be now described in great detail herein below. 

FIG. 9B is a schematic representation of the next stage in Through this document, the term "aggregation" and "pre- 
the method of segmented aggregation according to the 35 aggregation" shall be understood to mean the process of 

principles of the present invention, showing that any seg- summation of numbers, as well as other mathematical 

meni along dimension 1, such as the shown slice, can be operations, such as multiplication, subtraction, division etc. 

separately aggregated along the remaining dimensions, 2 n shall be understood that prc-aggregation operations occur 

and 3, and that in general, for an N dimensional system, the asynchronously with respect to the traditional query pro- 

seoorid stage involves aggregation in N-1 dimensions. The cessing operations. Moreover, the term "atomic data" shall 

principle of segmentation can be applied on the first stage as be understood to refer to the lowest level of data granularity 

well, however, only a large enough data will justify such a required for effective decision making. In the case of a retail 

sliced procedure in the first dimension. Actually, it is pos- merchandising manager, atomic data may refer to informa- 

stole to consider each segment as an N-1 cube, enabling tion by store, by day, and by item. For a banker, atomic data 

recursive computation. may ^e information by account, by transaction, and by 

FIG. 9C1 is a schematic representation of the Query branch. 

Directed Roll-up (QDR) aggregation method/procedure of general, the improved RDBMS system of the present 

the present mvention, showmg data aggregation starting invention excels in performing two distinct functions, 

from existmg basic data or previously aggregated data in the namely: the aggregation of data; and the handling of the 
first dimension (Dl), and such aggregated data being uti- 50 resulting data for "on demand" client use. Moreover, 

lized as a basis for QDR aggregation along the second because of improved data aggregation capabilities, the 

dimension (D2). RDBMS of the present invention can be employed in a wide 

FIG. 9C2 is a schematic representation of the Query range of applications, including Data Warehouses support- 
Directed Roll-up (QDR) aggregation method/procedure of ing OLAP systems and the like. For purposes of illustration, 
the present invention, showing initial data aggregation start- 55 initial focus will be accorded to the RDMS of the present 
ing from existing previously aggregated data in the second invention. 

third (D3), and continuing along the third dimension (D3), piG. 6A illustrates the primary components of an illus- 

and thereafter contmmng aggregation along the second trative embodimcm of the RDBMS of the present invention, 

dunension (D2). namely: support mechanisms including a query interface and 

FIG. lOA is a schematic representation of the "slice- eo query handler, a relational data store including one or more 

storage" method of storing sparse data in the disk storage fact tables and a meta-data store for storing a dictionary 

devices of the MDDB of FIG. 6B in accordance with the (catalogue), and an MDD Aggregation Module that stores 

principles of the present invention, based on an ascending- atomic daU and aggregated data in a non-relational MDD. It 

ordered index along aggregation direction, enabling fast should be noted that the RDBMS typically includes addi- 
retrieval of data. g5 ijonal components (not shown) that arc not relevant to the 

FIG. lOB is a schematic representation of the data orga- present invention. The query interface and query handler 

nization of data files and the directory file used in the storage service user-submitted queries (in the preferred 
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embodiment, SQL query statements) forwarded, for 
example, from a client machine over a network as shown. 
The query handler, fact table and meta-data store commu- 
nicate to the MDD Aggregation Module. Importantly, the 
query handler and integrated MDD Aggregation Module 
operates to provide for dramatically improved query 
response times for data aggregation operations and drill- 
downs. Moreover, it is an object of the present invention is 
to make user-querying of the non-relational MDDB no 
different than querying a relational table of the RDBMS, in 
a manner that minimizes the delays associated with queries 
that involve aggregation or drill down operations. This 
object is enabled by providing the novel RDBMS system 
and integrated aggregation mechanism of the present inven- 
tion. 

FIG. 6B shows the primary components of an illustrative 
embodiment of the MDD Aggregation Module of FIG. 6 A, 
namely: a base data loader for loading the directory and fact 
table(s) of the RDBMS; an aggregation engine for receiving 
dimension data and fact data from the base loader^ a multi- 
dimensional database (MDDB); a MDDB handler, an SQL 
handler that operates cooperatively with the query handler of 
the RDBMS to provide users with query access to the MDD 
Aggregation Module, and a control module for managing 
the operation of the components of the MDD aggregation 
module. The base data loader may load the directory and fact 
tables over a standard interface (such as OLDB, OLE-DB, 
ODBC, SQL, API, JDBC, etc.). In this case, the RDBMS 
and base data loader include components that provide com- 
munication of such data over these standard interfaces. Such 
interface components are well known in the art. For 
example, such interface components are readily available 
from Attunity Corporation, http://www.attunity.oom. 

During operation, the base data originates from the fact 
table(s) of the RDBMS. The core data aggregation opera- 
tions are performed by the Aggregation Engine; a Multidi- 
mensional Data Handler; and a Multidimensional Data Stor- 
age. The results of data aggregation are efficiently stored in 
a multidimensional data storage (MDDB), by the Data 
Handler. The SQL handler of the MDD Aggregation modtile 
services user-submitted queries (in the preferred 
embodiment, SQL query statements) forwarded from the 
query handler of the RDBMS. The SQL handler of the MDD 
Aggregation module may communicate with the query han- 
dler of the RDBMS over a standard interface (such as 
OLDB, OLE-DB, ODBC, SQL, API, JDBC. etc.). In this 
ca.se, the support mechanisms of the RDBMS and SQL 
handler include components that provide communication of 
such data over these standard interfaces. Such interface 
components are well known in the art. Aggregation (or drill 
down results) are retrieved on demand and relumed to the 
user. 

TVpicaUy, a user interacts with a client machine (for 
example, using a web-enabled browser) to generate a natural 
language query, that is communicated to the query interface 
of the RDBMS, for example over a network as shown. The 
query interface disintegrates the query, via parsing, into a 
series of requests (in the preferred embodiment, SQL 
statements) that are communicated to the query handler of 
the RDBMS. It should be noted that the functions of the 
query interface may be implemented in a module that is not 
part of the RDBMS (for example, in the client machine). The 
query handler of the RDBMS forwards requests that involve 
data stored in the MDD of the MDD Aggregation module to 
the SQL bander of the MDD Aggregation module for 
servicing. Each request specifics a set of n-dimensions. The 
SQL handler of the MDD Aggregation Module extracts this 
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set of dimensions and operates cooperatively with the MDD 
handler to address the MDDB using the set of dimensions, 
retrieve the addressed data from the MDDB, and return the 
results to the user via the query handler of the RDBMS. 

5 FIG. 6C(i) and 6C(ii) is a flow chart illustrating the 
operations of an illustrative RDBMS of the present inven- 
tion. In step 601, the base data loader of the MDD Aggre- 
gation Module loads the dictionary (or catalog) from the 
meta-data store in the RDBMS. In performing this function, 

10 the base data loader may utilize an adapter (interface) that 
maps the data types of the dictionary of the RDBMS (or that 
maps a standard data type used to represent the dictionary of 
the RDBMS) into the data types used in the MDD aggre- 
gation module. In addition, the base data loader extracts the 

15 dimensions from the dictionary and forwards the dimensions 
to the aggregation engine of the MDD Aggregation Module. 

In step 603, the base data loader loads the fact table(s) 
from the RDBMS. In performing this frinction, the base data 
loader may utilize an adapter (interface) that maps the data 
types of the fact table(s) of the RDBMS (or that maps a 
standard data type used to represent the fact table(s) of the 
RDBMS) into the data types used in the MDD Aggregation 
Module. In addition, the base data loader extracts the atomic 
data from the fact table, and forwards the atomic data to the 
aggregation engine. 

In step 605, the aggregation engine roils-up (aggregates) 
the atomic data (provided by the base data loader in step 
603) along at least one of the dimensions and operates 
cooperatively with the MDD handler to store the resultant 
aggregated data in the MDD database. A more detailed 
description of exemplary aggregation operations according 
to a preferred embodiment of the present invention is set 
forth below with respect to the QDR process of FIGS. 

35 9A-9C. 

In step 607, a reference is defined that provides users with 
the ability to query the data generated by the MDD Aggre- 
gation Module and/or stored in the MDDB of the MDD 
Aggregation Module. This reference is preferably defined 

40 using the Create View SQL statement, which allows the user 
to: i) define a table name (TN) associated with the MDD 
database stored in the MDD Aggregation Module, and ii) 
define a link used to route SQL statements on the table TN 
to the MDD Aggregation Module. In this embodiment, the 

45 view mechanism of the RDBMS enables reference and 
linking to the data stored in the MDDB of the MDD 
Aggregation Engine as illustrated in FIG. 6(E). A more 
detailed description of the view mechanism and the Create 
View SQL statement may be found in C. J, Date, "An 

50 Introduction to Database Systems," Addison-Wesley, Sev- 
enth Edition, 2000, pp. 289-326, herein incorporated by 
reference in its entirety. Thus, the view mechanism enables 
the query handler of the RDBMS system to forward any 
SQL query on table TN to the MDD aggregation module via 

55 the associated link. In an alternative embodiment, a direct 
mechanism (e.g., NA trigger mechanism) may be used to 
enable the RDBMS system to reference and link to the data 
generated by the MDD Aggregation Module and/or stored in 
the MDDB of the MDD Aggregation Engine as illustrated in 

50 FIG. 6F. A more detailed description of trigger mechanisms 
and methods may be found in C. J. Date, "An Introduction 
to Database Systems," Addison-Wesley, Seventh Edition, 
2000, pp. 250, 266, herein incorporated by reference in its 
entirety. 

65 In step 609, a user interacts with a client machine to 
generate a query, and the query is communicated to the 
query interface. The query interface generate one or more 
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SQL statements on the reference defined in step 607 (this 
reference refers to the data stored in the MDDB of the MDD 
Aggregation Module), and forwards the SQLstatement(s) to 
the query handler of the RDBMS. 

In step 611, the query handler receives the SQL statement 
(s); and optionally transforms such SQL statement(s) to 
optimize the SQL statement (s) for more efiScient query 
handling. Such transformations are well known in the art. 
For example, see Kimball, Aggregation Navigation With 
(Almost) No MetaData", DBMS Data Warehouse 
Supplement, August 1996, available at http:// 
www.dbmsmag.com/9608d54.html. 

In step 613: the query handler determines whether the 
received SQL statement(s) [or transformed SQL statement 
(s)] is on the reference generated in step 607. If so, operation 
continues to step 615; otherwise normal query handling 
operations continue is step 625 

In step 615, the received SQL statement(s) [or trans- 
formed SQLstatement(s)] is routed to the MDD aggregation 
engine for processing in step 617 using the link for the 
reference as described above with respect to step 607. 

In step 617, the SQL statement(s) is received by the SQL 
handler of the MDD Aggregation Module, wherein a set of 
one or more N-dimensional coordinates are extracted from 
the SQL statement. In performing this function, SQL handler 
may utilize an adapter (interface) that maps the data types of 
the SQL statement issued by query handler of the RDBMS 
(or that maps a standard data type used to represent the SQL 
statement issued by query handler of the RDBMS) into the 
data types used in the MDD aggregation module. 

In step 619, the set of N-dimensional coordinates 
extracted in step 617 are used by the MDD handler to 
address the MDDB and retrieve the corresponding data from 
the MDDB. 

Finally, in step 621, the retrieved data is returned to the 
user via the RDBMS (for example, by forwarding the 
retrieved data to the SQL handler, which returns the 
retrieved data to the query handler of the RDBMS system, 
which returns the results of the user-submitted query to the 
user via the client machine), and the operation ends. 

It should be noted that the facts (base data), as it arrives 
from RDBMS, may be analyzed and reordered to optimize 
hierarchy handling, according to the unique method of the 
present invention, as described later with reference to FIGS. 
IIA and UB. 

Moreover, the MDD control module of the MDD Aggre- 
gation Module preferably administers the aggregation pro- 
cess according to the method illustrated in FIGS. 9 A and 9B. 
llius, in accordance with the principles of the present 
invention, data aggregation within the RDBMS can be 
carried out either as a complete pre-aggregalion process, 
where the base data is fully aggregated before commencing 
querying, or as a query directed roll-up (QDR) process, 
where querying is allowed at any stage of aggregation using 
the "on-the-fly" data aggregation process of the present 
invention. The QDR process will be described hereinafter in 
greater detail with reference to FIG. 9C. The response to a 
request (i.e. a basic component of a client query) requiring 
"on-lhe-fly** data aggregation, or requiring access to pre- 
aggregated resuh data via the MDD handler is provided by 
a query/request serving mechanism of the present invention 
within the MDD control module, the primary operations of 
which are illustrated in the flow chart of FIG. 6D. The 
fimction of the MDD Handler is to handle multidimensional 
data in the storage(s) module in a very cfGcicnt way, 
according to the novel method of the present invention. 
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which will be described in detail hereinafter with reference 
to FIGS. lOAandlOB. 

The SQL handling mechanism shown io FIG. 6D is 
controlled by the MDD control module. Requests are 
5 queued and served one by one. If the required data is already 
pre-calculated, then it is retrieved by the MDD handler and 
returned to the client. Otherwise, the required data is cal- 
culated ''on-the-fly" by the aggregation engine, and the 
result moved out to the client, while simultaneously stored 
by the MDD handler, shown in FIG. 6C. 

FIGS. 7Aand 7B outline two different implementations of 
the RDBMS system of the present invention. In both 
implementations, the query handler of the RDBMS system 
supplies aggregated results retrieved from the MDD to a 
client. 

FIG. 7A shows a separate-platform implementation of the 
RDBMS system of the illustrative embodiment shown in 
FIG. 6A, wherein the support mechanisms (query handling) 
and relational data store (fact table(s) and dictionary) of the 
RDBMS resides on a separate hardware platform and/or OS 
20 system from that used to run the MDD Aggregation Module. 
In this type of implementation, it is even poss^le to run parts 
of the RDBMS system and the MDD Aggregation Module 
on different-type operating systems (e.g. NT, Unix, MAC 
OS). 

FIG. 7B ^ows a common-platform implementation of the 
RDBMS system of the illustrative embodiment shown in 
FIG. A, wherein the support mechanisms (query handling) 
and the relational data store (fact table(s) and dictionary) of 
the RDBMS shares the same hardware platform and oper- 
ating system (OS) that used to run the MDD Aggregation 
Module. 

FIG. 8A shows a table setting forth the benchmark results 
of the MDD aggregation module, in accordance with the 
principles of the present invention. The platform and OS is 
realized using a Pentium II 450 Mhz, 1 GB RAM, 18 GB 
Disk, running the Microsoft NT operating system. The six 
(6) data sets shown in the table differ in number of 
dimensions, number of hierarchies, measure of sparcity and 
^ data size. A comparison with ORACLE Express, a major 
OLAP server, is made. It is evident that the MDD aggrega- 
tion module of the present invention outperforms currently 
leading aggregation technology by more than an order of 
magnitude. 

Preferably, the MDD aggregation module of the RDBMS 
of the present invention supports a segmented data aggre- 
gation method as described in FIGS. 9A through 9C2. These 
figures outline a simplified setting of three dimensions only; 
however, the following analysis applies to any number of 
dimensions as well. 

The data is being divided into autonomic segments to 
minimize the amount of simultaneously handled data. The 
initial aggregation is practiced on a single dimension only, 
while later on the aggregation process involves all other 
dimensions. 

At the first stage of the aggregation method, an aggrega- 
tion is performed along dimension 1. The first stage can be 
performed on more than one dimension. As shown in FIG. 
9A, the space of the base data is expanded by the aggrega- 
50 tion process. 

In the next stage shown in FIG. 9B, any segment along 
dimension 1, such as the shown slice, can be separately 
aggregated along the remaining dimensions, 2 and 3. In 
general, for an N dimensional system, the second stage 
65 involves aggregation in N-1 dimensions. 

The principle of data segmentation can be applied on the 
first stage as well. However, only a large enough data set will 
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justify such a sliced procedure in the first dimension. 
Actually, it is possible to consider each segment as an N-1 
cube, enabling recursive computation. 

It is imperative to get aggregation results of a specific 
slice before the entire aggregation is completed, or 5 
alternatively, to have the roll-up done in a particular 
sequence, l^his novel feature of the aggregation method of 
the present invention is that it allows the querying to begin, 
even before the regular aggregation process is 
accomplished, and still having fast response. Moreover, in lo 
relational OLAP and other systems requiring only partial 
aggregations, the QDR process dramatically speeds up the 
query response. 

The QDR process is made feasible by the slice-oriented 
roll-up method of the present invention. After aggregating 
the first dimension(s), the multidimensional space is com- 
posed of independent multidimensional cubes (slices). 
These cubes can be processed in any arbitrary sequence. 

Consequently the aggregation process of the present 
invention can be monitored by means of files, shared 
memory sockets, or queues to statically or dynamically set 
the roll-up order. 

In order to satisfy a single query, before the required 
aggregation result has been prepared, the QDR process of 
the present invention involves performing a fast on-the-fly 
aggregation (roll-up) involving only a thin slice of the 
multidimensional data. 

FIG. 9C1 shows a slice required for building-up a roll-up 
result of the 2"*^ dimension. In case 1, as shown, the 
aggregation starts from an existing data, either basic or 
previously aggregated in the first dimension. This data is 
utilized as a basis for QDR aggregation along the second 
dimension. In case 2, due to lack of previous data, a QDR 
involves an initial slice aggregation along dimension 3, and 
thereafter aggregation along the 2"'' dimension. 

FIG. 9C2 shows two corresponding QDR cases for gain- 
ing results in the 3d dimension. Cases 1 and 2 differ in the 
amount of initial aggregation required in 2"^ dimension. 

FIG. lOAillustrales the "Slice-Storage" method of storing 40 
sparse data on storage disks. In general, this data storage 
method is based on the principle that an ascending-ordered 
index along aggregation direction, enables fast retrieval of 
data. FIG. lOA illustrates a unit-wide slice of the multidi- 
mensional cube of data. Since the data is sparse, only few 45 
non-NA data points exist, 'lliese points are indexed as 
follows. The Data File consists of data records, in which 
each N-1 dimensional slice is being stored, in a separate 
record. These records have a varying length, according to the 
amount of non-NA stored points. For each registered point 59 
in the record, IND^^ stands for an index in a n-dimensional 
cube, and Data stands for the value of a given point in the 
cube. 

FIG. lOB illustrates a novel method for randomly search- 
ing for a queried data point in the MDD of the RDBMS of 55 
the present invention by using a novel technique of orga- 
nizing data files and the directory file used in the storage of 
the MDD, so that a simple binary search technique can then 
be employed within the aggregation module of the RDMB. 
According to this method, a metafile termed DIR File, keeps 60 
pointers to Data Files as well as additional parameters such 
as the start and end addresses of data record (IND^, IND„), 
its location within the Data File, record size (n), file's 
physical address on disk (D_Patb), and auxiliary informa- 
tion on the record (Flags). 6S 

A search for a queried data point is then performed by an 
access to the DIR file. The search along the file can be made 
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using a simple binary search due to file's ascending order. 
When the record is found, it is then loaded into main 
memory to search for the required point, characterized by its 
index IND^^ The attached Data field represents the queried 
value. In case the exact index is not found, it means that the 
point is a NA. 

FIGS. HA and IIB illustrate a novel method performed 
by the MDD aggregation module of the RDBMS of the 
present invention for pre-processing data such that multi- 
hierarchies in multi-tderarchical structures are optimally 
merged. According to the devised method, the inner order of 
hierarchies within a dimension is optimized, to achieve 
ef&cient data handling for summations and other mathemati- 
cal formulas (termed in general "Aggregation"). The order 
of hierarchy is defined externally. It is brought from a data 
source to the stand-alone aggregation engine, as a descriptor 
of data, before the data itself. In the illustrative embodiment, 
the method assumes hierarchical relations of the data, as 
shown in FIG. IIA. The way data items are ordered in the 
memory space of the Aggregation Server, with regard to the 
hierarchy, has a significant impact on its data handling 
efficiency. 

Notably, when using prior art techniques, multiple han- 
dling of data elements, which occurs when a data element is 
accessed more than once during aggregation process, has 
been hitherto imavoidable when the main concern is to 
effectively handle the sparse data. The data structures used 
in prior art data handling methods have been designed for 
fast access to a available data (not NA data). According to 
prior art techniques, each access is associated with a timely 
search and retrieval in the data stmcture. For the massive 
amount of data typically accessed from a Data Warehouse in 
an OLAP application, such multiple handling of data ele- 
ments has significantly degraded the efficiency of prior art 
data aggregation processes. When using prior art data han- 
dling techniques, the data element D shown in FIG. 11 A 
must be accessed three times, causing poor aggregation 
performance. 

In accordance with the present invention, the MDD aggre- 
gation module of the RDBMS performs the loading of base 
data and the aggregation and storage of the aggregated data 
in a way that limits the access of to a singular occurrence, 
as opposed to multiple occurrences as taught by prior art 
methods. According to the present invention, elements of 
base data and their aggregated results are contiguously 
stored in a way that each element will be accessed only once. 
'ITiis particular order allows a forward-only handling, never 
backward. Once a base data element is stored, or aggregated 
result is generated and stored, it is never to be retrieved again 
for further aggregation. As a result the storage access is 
minimized. This way of singular handling greatly elevates 
the aggregation efficiency of large data bases. The data 
element D, as any other element, is accessed and handled 
only once. 

FIG. IIA shows an example of a multi-hierarchical data- 
base structure having 3 hierarchies. As shown, the base data 
includes the items A,B,F, and G., The second level is 
composed of items C,£,H and I. Ihe third level has a single 
item D, which is common to all three hierarchical structures. 
In accordance with the method of the present invention, a 
minimal computing path is always taken. For example, 
according to the method of the present invention, item D will 
be calculated as part of structure 1, requiring two math- 
ematical operations only, rather than as in structure 3, which 
would need four mathematical operations. FIG. IIB depicts 
an optimized stnicmre merged from all three hierarchies. 

FIG. 12 sununarizes the different enabling components 
for segmented aggregation. The minimized operations in 
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handling multi-hierarchies need analysts of the base data. It generated in response to user input, said query servic- 

greatly optimizes data handling and contribute to aggrega- ing mechanism comprising: 

tion speed. Based on this technology loading and indexing a reference generating mechanism for generating a 
operations become very efficient, minimizing memory and user-defined reference to aggregated fact data gen- 
storage access, and speeding up storing and retrieval opera- 5 erated by the aggregation module; and 
tions. On top of all the enabling technologies is the seg- a query processing mechanism for processing a given 
mented aggregation technique, not just outperforming by query statement, wherein, upon identifying that the 
orders of magnitude the prior-art aggregation algorithms, given query statement Is on said user-defined 
but also enabling the unique QDR which waves out the need reference, communicates with said aggregation mod- 
of waiting for full pre-aggregation. ule over an interface therebetween to retrieve por- 
FIG. 13 shows the improved RDBMS of the present tions of aggregated fact data pointed to by said 
invention as a component of a data warehouse, serving the reference that are relevant to said given query state- 
data storage and aggregation needs of a ROLAP system (or ment. 

other OLAP systems alike). Importantly, the improved 2. The RDBMS of claim 1, wherein said aggregation 

RDBMS of the present invention provides flexible, high- module includes a query haiidling mechanism for receiving 

performance access and analysis of large volumes of com- ^^^^ statements, and wherem communication between said 

plex and interrelated data. Moreover, the improved Data ^^^^ processing mechanism and said query handling 

«7 1- TiTAr>»*o r .u . • -1. mechanism is accomplished by forwarding the given query 

Warehouse RDBMS of the present invention can simulta- et,t««r,««t tu^ k.„^i,«rr «,»^t„«;eS. «f tK» 

1 ji'or . 1 • J r 1 • . / J . . Statement to the query handung mechanism 01 the aggrega- 

neously serve many different kinds of clients (e.g. data mart, lion module "» » 00-0 

OLAP URL) and has the power of delivering an enterprise- 3. The RDBMS of claim 2, wherein said query handling 

wide data storage and aggregation m a cost-effective way mechanism extracts dimensions from the received query 

T^is kind of system eliminates redundan^^ statement and forwards the dimensions to the storage 

chents, dehvenng scalability and flexibility. Moreover the ^^^^^^^ ^^^^^ ^.j ^^^^^^ ^^^^^ loc^ti^^ng 

improved RDBMS of the present invention can be used as non-relational multi-dimensional data store based 
the data store component of in any informational database ^5 upon the forwarded dimensions and returns the retrieved 

system as descnbed above, mcludmg data analysis programs ^j^j^ ^ack to the query servicing mechanism for communi- 

such as spread-sheet modeling programs, serving the data cation to the user 

storage and aggregation needs of such systems. 4 ^he RDBMS of claim 1, wherein said aggregation 

Functional Advantages Gained By The Improved module includes a data loading mechanism for loading at 
RDBMS Of The Present laveotion 30 least fact data from the relational data store, an aggregation 

n,e features of ihe RDBMS of the present invention. aggregating the fact data and an storage handler 

provides for dramatically improved response time in ban- for slormg the fact data and resultant aggregated fact data in 

dling queries issued to the RDBMS that involve Ihe non-relational multi^menaond ^ 

aggregation, thus enabling enterprise-wide centralized ^-^J^ ^P^^^ °^ ^'f™ 4. wherein said aggregation 

aggregation. Moreover, in the preferred embodiment of the 1°8^*= "P?" determining that 

present invention, users can query the aggregated data in an non-relational mulu-dmiensional data store does not 

manner no different than traditional queries on an RDBMS. daU required to service the given query statement, 

T, .ujco .J* .• 1 ju.u controls the data loading mechanism and aggregation engine 

The method of Segmented Aggregation employed by the . . . i . r . j . • j . ■ .l ■ 

1 nnniico r .u . • ? -ji n •u-i-. to aggregate at least fact data required to service the given 

novel RDBMS of the present mvention provides flexibility, „ ^ . . .iTZ ji. 
. t VI'. .u i^-i: en r»' . J A 10 query statement and controls the aggregation module to 

scalability, Ihe capability of Query Directed Aggregation, . .. .u ■ ■ 

and speed improvement. "ss- b . aggregated data back to the query servicing 

..'^ ... J , « J . mechanism for communication to the user. 

°^ Quenr Directed Aggregation ^ ^^^^ „f ^^^^ j combination with a data 

(QDR) employed by the novel RDBMS of the present ^ ^^^^ ^„ ^^AP system, 

invention minimizes the data handhng operations in mulu- ^ j^pQ^g of claim 6. wherein said OLAP system is 

hierarchy data structures, eliminates the need to wait for full ^ ROLAP system 

aggregation 10 be complete, and provides for build-up of g Rj^gj^g ^j^^ ^ ^ enterprise wide 

aggregated data required for fiiU aggregaUon. ^^^^ warehouse that interfaces to a pluraUty of information 

It IS understood that the System and Method of the technology systems, 

illustrative embodiments described herein above may be 50 9 rdbMS of claim 1, for use as a database store in 

modified in a variety of ways which will become readily informational database system. 

apparent to those skilled in the art of having the benefit of iq j^^ rqBMS of claim 9, wherein said informational 

the novel teachings disclosed herein. All such modifications database system is a spreadsheet modeling program, 

and variations of the iUustrative embodiments thereof shall ^ j^^ RDBMS of claim 1, wherein said query state- 
be deemed to be within the scope and spirit of the present 55 ^^^3 generated by a query interface in response to 

mvention as defined by the claims to Invention appended communication of a natural language query communicated 

from a client machine. 

What IS claimed is: 12. The RDBMS of claim 11, wherein said client machine 

1. A relaUonal database management system (RDBMS) comprises a web-enabled browser to communicate said 

comprismg: natural language query to the query interface, 

a relational data store storing fact data; 13. jhe RDBMS of claim 1, wherein said interface that 

an aggregation module, operatively coupled to the rela- provides communication between said query processing 

tional data store, for aggregating the fact data and mechanism and said aggregation module comprises a stan- 

storing the resultant aggregated data in a non-relational dard interface. 

multi-dimensional data store; 55 14. The RDBMS of claim 13, wherein said standard 

a query servicing mechanism, operatively coupled to the interface is selected from the group consisting of OLDB, 

aggregation module, for servicing query statements OLE-DB, ODBC, SQL, JDBC. 
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15. In a relational database management system 
(RDBMS) comprising a relational data store storing fact 
data, a method for aggregating the fact data and providing 
query access to the aggregated data comprising the steps of: 

providing an integrated aggregation module, operalively 5 
coupled to the relational data store, for aggregating the 
fact data and storing the resultant aggregated data in a 
non-relational multi-dimensional data store; 

in response to user input, generating a reference to aggre- 
gated fact data generated by the aggregation module; 
and 

processing a given query statement generated in response 
to user input, wherein, upon identifying that the given 
query statement is on said reference, conununicating 
with said integrated aggregation module over an inter- 
face operably coupled thereto to retrieve [retrieving] 
from the integrated aggregation module portions of 
aggregated fact data pointed to by said reference that 
are relevant to said given query statement. 

16. The method of claim 15, further comprising the step 
of extracting dimensions from the given query statement, 
accessing locations of the non-relational multi-dimensional 
data store based upon the extracted dimensions, and return- 
ing the retrieved data back to the user. 

17. The method of claim 15, wherein said aggregation 
module includes a data loading mechanism for loading at 
least fact data from the relational data store, an aggregation 
engine for aggregating the fact data and an storage handler 
for storing the fact data and resultant aggregated fact data in 
the non-relational multi-dimensional data store. 

18. The method of claim 17, wherein said aggregation 
module, upon determining that the non-relational multi- 
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dimensional data store does not contain data required to 
service the given query statement, controls the data loading 
mechanism and aggregation engine to aggregate at least fact 
data required to service the given query statement and 
controls the aggregation module to return the aggregated 
data back to the user. 

19. The method of claim 15, further comprising the step 
of performing data analysis operations on the retrieved data 
as part of an OLAP system. 

20. The method of daim 19, RDBMS of claim 6, wherein 
said OLAP system is a ROLAP system. 

21. The method of claim IS, wherein said RDBMS is used 
as an enterprise wide data warehouse that interfaces to a 
plurality of information technology systems. 

22. The method of claim 15, wherein said RDBMS is uses 
as a daubase store in an informational database system. 

23. The method of claim 22, wherein said informational 
database system is a spread-sheet modeling program. 

24. The method of claim 15, wherein said query state- 
ments are generated by a query interface in response to 
communication of a natural language query conununicated 
from a client machine. 

25. 'ITie method of claim 24, wherein said client machine 
comprises a web-enabled browser to communicate said 
natural language query to the query interface. 

26. The method of claim 25, wherein said interface that is 
operably coupled to said aggregation module comprises a 
standard interface. 

27. The method of claim 26, wherein said standard 
interface is selected from the group consisting of OLDB, 
OLE-DB, ODBC, SQL, JDBC. 

* * * * * 
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